Topic Extraction from News Archive Using TF*PDF Algorithm
نویسندگان
چکیده
Busy and no time to digest the news archive .... ? Ever since the Web wide-spreading, the amount of electronically available information online, especially news archive proliferates and threatens to overwhelm human attention. Seeing this, we propose an information system that will extract the main topics in the news archive in a weekly basis. By getting a weekly report, user can know what were the main news events in the past week.
منابع مشابه
Trees for Topic Detection
Extracting topic keywords from on-line text documents is highly significant in text mining applications. In our work, extracted keywords are represented as a hierarchical topic tree. For this, we basically use incremental clustering technique for incoming online documents. Moreover, we define a cluster-based measure similar to the tfidf measure and a probabilistic inequality to determine subsum...
متن کاملNews Topic Tracking and Re-ranking with Query Expansion Based on Near-Duplicate Detection
Increase of digital storage capacity enabled the creation of large-scale news video archives. To make full use of the archive, it is necessary to grasp the development and dependencies of news stories. Considering this problem, we investigate tracking and re-ranking methodologies of news stories. The archive used as a test-bed consists of more than 30,000 news stories. This paper proposes a nov...
متن کاملUnsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0
We improve the automatic speech recognition of broadcast news using paradigms from Web 2.0 to obtain timeand topicrelevant text data for language modeling. We elaborate an unsupervised text collection and decoding strategy that includes crawling appropriate texts from RSS Feeds, complementing it with texts from Twitter, language model and vocabulary adaptation, as well as a 2-pass decoding. The...
متن کاملDiscriminative Features Selection in Text Mining Using TF - IDF Scheme
This paper describes technique for discriminative features selection in Text mining. 'Text mining’ is the discovery of new, previously unknown information, by computer. Discriminative features are the most important keywords or terms inside document collection which describe the informative news included in the document collection. Generated keyword set are used to discover Association Rules am...
متن کاملExtracting Named Entities Using Named Entity Recognizer and Generating Topics Using Latent Dirichlet Allocation Algorithm for Arabic News Articles
This paper explains for the Arabic language, how to extract named entities and topics from news articles. Due to the lack of high quality tools for Named Entity Recognition (NER) and topic identification for Arabic, we have built an Arabic NER (RenA) and an Arabic topic extraction tool using the popular LDA algorithm (ALDA). NER involves extracting information and identifying types, such as nam...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002